Reusing Old Policies to Accelerate Learning onNew
نویسنده
چکیده
We consider the reuse of policies for previous MDPs in learning on a new MDP, under the assumption that the vector of parameters of each MDP is drawn from a xed probability distribution. We use the options framework, in which an option consists of a set of initiation states, a policy, and a termination condition. We use an option called a reuse option, for which the set of initiation states is the set of all states, the policy is a combination of policies from the old MDPs, and the termination condition is based on the number of time steps since the option was initiated. Given policies for m of the MDPs from the distribution, we construct reuse options from the policies and compare performance on an m + 1st MDP both with and without various reuse options. We nd that reuse options can speed initial learning of the m + 1st task. We also present a distribution of MDPs for which reuse options can slow initial learning. We discuss reasons for this and suggest other ways to design reuse options.
منابع مشابه
Reusing Risk-Aware Stochastic Abstract Policies in Robotic Navigation Learning
In this paper we improve learning performance of a riskaware robot facing navigation tasks by employing transfer learning; that is, we use information from a previously solved task to accelerate learning in a new task. To do so, we transfer risk-aware memoryless stochastic abstract policies into a new task. We show how to incorporate riskawareness into robotic navigation tasks, in particular wh...
متن کاملExploration Strategies for Reusing Past Policies
The balance between exploring new actions and states, or exploiting the knowledge acquired while learning has been widely studied in Reinforcement Learning. There is also a clear interest in how past policies that solve different tasks may help to solve a new one, and it also requires a balance between explore, exploite past policies or to exploit the current one. In this paper, we show that re...
متن کاملReusing Learned Policies Between Similar Problems
We are interested in being able to leverage policy learning in complex problems upon policies learned for similar problems. This capability is particularly important in robot learning, where gathering data is expensive and time-consuming, and prohibits directly applying reinforcement learning. In this case, we would like to be able to transfer knowledge from a simulator, which may have an inacc...
متن کاملLearning and Reusing Goal-Specific Policies for Goal-Driven Autonomy
In certain adversarial environments, reinforcement learning (RL) techniques require a prohibitively large number of episodes to learn a highperforming strategy for action selection. For example, Q-learning is particularly slow to learn a policy to win complex strategy games. We propose GRL, the first GDA system capable of learning and reusing goal-specific policies. GRL is a case-based goal-dri...
متن کاملReusing Learning Objects and the Impact of Web 3.0 on e-Learning Platforms
E-Learning promotes the exchange of experiences and knowledge that facilitate the learning of students without the time and space restrictions imposed by traditional models. The potential for reusability is a primary attraction for educators when discussing about learning objects. Reusing learning objects is as old as retelling a story or making use of libraries and textbooks. In electronic for...
متن کامل